Limit session state during metadata queries in Iceberg #19757

findepi · 2023-11-15T17:07:05Z

Metadata queries such as information_schema.columns,
system.jdbc.columns or system.metadata.table_comments may end up
loading arbitrary number of relations within single query (transaction).
It is important to bound memory usage for such queries.

In case of Iceberg Hive metastore based catalog, this is already done in
TrinoHiveCatalogFactory bu means of configuring per-query
CachingHiveMetastore. However, catalogs with explicit caching need
something similar.

Let it differentiate between "a"."b.c" and "a.b"."c" tables.

Metadata queries such as `information_schema.columns`, `system.jdbc.columns` or `system.metadata.table_comments` may end up loading arbitrary number of relations within single query (transaction). It is important to bound memory usage for such queries. In case of Iceberg Hive metastore based catalog, this is already done in `TrinoHiveCatalogFactory` bu means of configuring per-query `CachingHiveMetastore`. However, catalogs with explicit caching need something similar.

pajaks · 2023-11-16T13:12:02Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java

@@ -689,6 +690,7 @@ public void dropCorruptedTable(ConnectorSession session, SchemaTableName schemaT
        }
        String tableLocation = metadataLocation.replaceFirst("/metadata/[^/]*$", "");
        deleteTableDirectory(fileSystemFactory.create(session), schemaTableName, tableLocation);
+        invalidateTableCache(schemaTableName);


Can this be moved to dropTableFromMetastore or even to deleteTable to simplify code and prevent from omitting in case new functions in future?

i thought about it. technically it would work, but i considered dropTableFromMetastore being just a technical operation, which may or may not be invoked, or be the last operation as part of the drop flow

pajaks · 2023-11-16T13:15:21Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java

@@ -365,6 +365,7 @@ private static Optional<String> getQueryId(io.trino.plugin.hive.metastore.Table
    public void unregisterTable(ConnectorSession session, SchemaTableName schemaTableName)
    {
        dropTableFromMetastore(schemaTableName);
+        invalidateTableCache(schemaTableName);


Can this and folowing be moved to dropTableFromMetastore ?

#19757 (comment)

findepi · 2023-11-16T13:50:29Z

CI #16315 (#8920), #15187
retrying

mosabua · 2023-11-17T00:05:27Z

No release note entry @findepi ?

alexjo2144 · 2023-11-17T15:53:40Z

My understanding is that the caching here is important for ensuring that queries use the same snapshot in different phases of query execution. It's unlikely that a regular select would fill this cache but it'd be nice if we could have them be unbounded when necessary.

findepi · 2023-11-17T16:22:20Z

I see your point and agree.
I allowed myself to be a bit lazy here.
The pattern exists in Hive connector since long time. It's configurable there (hive.per-transaction-metastore-cache-maximum-size) but i don't recall people needing to configure this. Of course I agree it would be better to have a configuration toggle for this.

We probably should also update the JDBC connector. DefaultJdbcMetadataFactory creates CachingJdbcClient with unbounded cache, so it can OOM for large metadata queries.
cc @hashhar @kokosing

findepi added 3 commits November 15, 2023 17:42

Remove unused method

ae28d45

Fix REST Iceberg catalog for names with dots

7789b7a

Let it differentiate between "a"."b.c" and "a.b"."c" tables.

Evict session level cache when dropping Iceberg table

5307295

findepi requested review from ebyhr, alexjo2144 and pajaks November 15, 2023 17:07

cla-bot bot added the cla-signed label Nov 15, 2023

github-actions bot added the iceberg Iceberg connector label Nov 15, 2023

findepi force-pushed the findepi/iceberg-oom-limit branch from a1ab9d5 to 2c3e111 Compare November 15, 2023 21:23

findepi force-pushed the findepi/iceberg-oom-limit branch from 2c3e111 to ad51682 Compare November 15, 2023 21:40

pajaks reviewed Nov 16, 2023

View reviewed changes

pajaks approved these changes Nov 16, 2023

View reviewed changes

findepi merged commit 8d4f26b into master Nov 16, 2023
96 of 97 checks passed

findepi deleted the findepi/iceberg-oom-limit branch November 16, 2023 15:25

github-actions bot added this to the 434 milestone Nov 16, 2023

mosabua mentioned this pull request Nov 17, 2023

Add Trino 434 release notes #19764

Merged

findepi added the no-release-notes This pull request does not require release notes entry label Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit session state during metadata queries in Iceberg #19757

Limit session state during metadata queries in Iceberg #19757

findepi commented Nov 15, 2023

pajaks Nov 16, 2023

findepi Nov 16, 2023

pajaks Nov 16, 2023

findepi Nov 16, 2023

findepi commented Nov 16, 2023

mosabua commented Nov 17, 2023

alexjo2144 commented Nov 17, 2023

findepi commented Nov 17, 2023

Limit session state during metadata queries in Iceberg #19757

Limit session state during metadata queries in Iceberg #19757

Conversation

findepi commented Nov 15, 2023

pajaks Nov 16, 2023

Choose a reason for hiding this comment

findepi Nov 16, 2023

Choose a reason for hiding this comment

pajaks Nov 16, 2023

Choose a reason for hiding this comment

findepi Nov 16, 2023

Choose a reason for hiding this comment

findepi commented Nov 16, 2023

mosabua commented Nov 17, 2023

alexjo2144 commented Nov 17, 2023

findepi commented Nov 17, 2023